List of Web archiving initiatives

This page contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data and access methods.

1 Web archiving initiatives
2 Archived data
3 Access methods
4 References

Web archiving initiatives

Name	Country	Creation Year	Technologies	Number of Employees		Comments
Name	Country	Creation Year	Technologies	Full-time	Part-time	Comments
Australia's Web Archive^[1]	Australia	1996	PANDORA Digital Archiving System (PANDAS), NLA Trove, HTTrack.	4	>4.25	It is a collaborative program of 11 agencies that provide an estimate average monthly staffing equivalent to 4 FTE. IT outsourced support: 0.25 person-month. Whole Domain Harvests are conducted by the Internet Archive using Heritrix, Wayback Machine.
Our digital island, a Tasmanian Web Archive^[2]	Australia	1996	HTTrack, Experimentally: Web Curator, Heritrix and Wayback Machine		1
PageFreezer ^[3]	Canada, US, Netherlands, Belgium	2005	PageFreezer's Deep Web Crawler, Lucene, Solr			Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws.
Web@rchive Austria^[4]	Austria	2008	Archive-access tools and NetarchiveSuite.dk		2
DILIMAG (Digital Literature Magazines)^[5]	Austria	2007	WebCurator	2		One technician, one for collecting and metadata.
Government of Canada Web Archive (GCWA)^[6]	Canada	2005	Heritrix, Wayback Machine and Nutchwax.		2
Web Information Collection and Preservation - WICP (Chinese Web Archive)^[7]	China	2003	Heritrix, Wayback Machine and Nutchwax.
Croatian Web Archive (Hrvatski arhiv weba - HAW)^[8]	Croatia	2004	Lucene	4	3	2 librarians full time, 2 librarians part time, 1 IT professional (National and University Library in Zagreb), 1 or 2 IT professionals (from Zagreb University Computing Centre (Srce)- our partner)
WebArchiv (National Library of the Czech Republic)^[9]	Czech Republic	2000	Nutch, NutchWAX and WERA tools.	5		3.5 FTE library staff + approx. 1.5 FTE technical staff
Netarkivet.dk^[10]	Denmark	2005	NetarchiveSuite.dk and Heritrix.		18	18 people involved (developers, librarians, operations staff, project managers). All together 5 FTE.
Finnish Web Archive^[11]	Finland	2008	NutchWAX	2	>2	Group of librarians that in part-time select what to archive from the Finnish web space.
BnF - BnF Web Legal Deposit^[12]	France	2006	Heritrix, Wayback Machine and NutchWAX. NetarchiveSuite.	9
Ina (Institut National de l'Audiovisuel)^[13]	France	2009	Crawl : PhagoSite, Croket, Heritrix / Access : Dowser	6		Staff of 80 documentalists taking part in nominating sites and QA
E-diaspora (Télécom ParisTech, FMSH)^[14]	France	2010	Crawl : PhagoSite	1		30 researchers taking part in nominating sites
Internet Memory Foundation (ATN service)^[15]	France, Netherlands	2004	IM large scale crawler (under development), Heritrix, Hanzo's crawler, IM Access software. Storage of Web Content: Hbase	21	0	11 people for quality crawls (QA, crawl engineering, project management), 9 developers & infrastructure, 1 manager.
Bibliotheksservice-Zentrum Baden-Württemberg^[16]	Germany	2003		7.5
Web archive of the German Bundestag^[17]	Germany	2005
Iceland^[18]	Iceland	2004	Heritrix, Wayback Machine
Japan Web Archiving Project^[19]	Japan	2004	Heritrix, Solr. Previously: Wget, Accela BizSearch	10	2	Launched in April 2004 as a pilot project, WARP (Web Archiving Project) has been in full-scale operation since July 2007.^[20]
National Library of Korea - OASIS (Online Archiving & Searching Internet Sources)^[21]	Korea	2001	Own system based on Oracle DBMS and specialized search engine (IRS) that performs data management and search function.	3	11
Koninklijke Bibliotheek^[22]	Netherlands	2006	Heritrix, KB e-Depot system	1	~7
National Library of Latvia^[23]	Latvia	2005	Heritrix		1	Currently only storing for preservation, access to public in development (ETA June 2012). The latvian term for web harvesting is "rasmošana".
New Zealand Web Archive^[24]	New Zealand	1999	Wayback Machine	3	>10	3-4 people at the National Library (various hours) and 2 people at the Internet Archive during the time of domain harvests. Selective web archiving = 3 full time staff. Technical services = 1 staff member responds to technical problems when they arise. National Digital library = 2-3 staff members ad hoc. NDHA (National Digital Heritage Archive) = various staff members respond to web archiving issues as they arise.
The National Library of Norway^[25]	Norway
Portuguese Web Archive^[26]	Portugal	2007	Heritrix, Wayback Machine, NutchWAX	4	1
Web archive of Čačak^[27]	Serbia	2009	HTTrack		1
Web Archive Singapore^[28]	Singapore		Wayback Machine, Heritrix, NutchWAX, WERA
Slovenian Web Archive^[29]	Slovenia	2007	Heritrix, Wayback Machine	1
Digital Preservation of .ES domain^[30]	Spain	2006	Internet Archive	2	>2	Can pool additional resources if necessary from computing controllers and financial department.
Digital Heritage of Catalonia^[31]	Spain	2006	Heritrix, Wayback Machine, WERA, Nutchwax and Web Curator.	4
Basque Digital Heritage Archive^[32]	Spain	2008	Heritrix, Wayback Machine, Nutchwax and Web Curator.	1
Sweden (Kulturarw3)^[33]	Sweden	1996	Heritrix. Own system for storage, maintenance and access		1.25	Paus in operation november 2009 - may 2011.
Aleph Archives^[34]	Switzerland/USA	2010	Distributed crawler, ArchiView access plugin, High performance search engine, Near real time indexing, Web Monitoring tools	7		Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their web contents regardless of their types (websites, wikis, social media, forums...).
Web Archive Switzerland^[35]	Switzerland	2008	Heritrix, Wayback Machine		3	1 crawl engineer, 1 person for quality assurance, 1 coordinator. The curators, who do the selection, are partner libraries all over Switzerland.
NTU Web Archiving System, NTUWAS^[36]	Taiwan	2007	Lucene		3
Web Archive Taiwan^[37]	Taiwan	2007
The UK Web Archive^[38]	UK	2004	Heritrix, Web Curator Tool, Wayback Machine and moving to Solr for searching.
Hanzo Archives^[39]	UK	2006	Hanzo Crawler, Search, and Access Tools.			Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive^[40]	UK	2004	ATN Service	4	2	Technical side of our web archiving operation is contracted out to the Internet Memory Foundation so the figures account for QA, curatorial and management staff only
Internet Archive (provides Archive-it service)^[41]	USA	1996	Heritrix, Wayback Machine, NutchWAX and other tools developed by the Internet Archive	12
Reed Technology Web Archiving Services^[42]	USA	2010	TrueArchive™ Technology			Reed Technology Web Archiving Services provides support for Litigation Protection, Compliance, e-Discovery and Social Media Management.
Columbia University Libraries Web Resources Collection Program^[43]	USA	2009	Archive-it service	3	>1	Part-time consultation/supervision from other librarians adding up to about 1 FTE.
North Carolina State Government Web Site Archives^[44]	USA	2005	Archive-it service		3
Latin American Web Archiving Project^[45]	USA	2005	Archive-it service
Web Archiving Project for the Pacific Islands^[46]	USA		Archive-it service		4
Library of Congress Web Archives^[47]	USA	2000	Heritrix, Wayback Machine, and the DigiBoard, an in-house curatorial/permissions tool	6	80	The part time workers spend a few hours per month (on average) selecting content for the collections.
Harvard University Library: the Web Archive Collection Service (WAX)^[48]	USA	2006	Own system based on Archive-access and other open-source tools.		>6	3 part time on IT support. External curators within 3 units but don't know the size of them.
Web Archiving Service from California Digital Library (WAS service)^[49]	USA	2005	Heritix, Wayback Machine, NutchWAX	4	>1	The number of hours that curators devote to the service is very variable.
University of Michigan Web Archives Project^[50]	USA	2000	WAS service		2
University of Texas at San Antonio Web Archives^[51]	USA	2009	Archive-It		3	The number of hours varies dependent upon how the crawls are scheduled.
qumram^[52]	Switzerland	2010	Chronos Web Archiving Software Suite			Commercial web archiving software suite. Provides both harvesting as well as transactional web archiving. Allows integrations with any possible repository (database, file system, electronic archive or records management system). Specializes on regulatory compliance.
SAPERION^[53]	Germany	2011	SAPERION ECM Web Content Archive			Commercial enterprise content management suite specializes on regulatory compliance. The product provides both harvesting as well as transactional web archiving based on the integration of qumram´s^[52] Chronos Web Archiving Software Suite. Web content is just another chanel from which content is reaching SAPERION. Others may be scanner, fax, e-mail, mobiles devices, office suites or any other system creating content like ERP systems.
Bibliotheca Alexandrina's Internet Archive	Egypt	2002	Heritrix, Wayback Machine	3		Current crawling interests: Egypt beyond January 25, Arab League ccTLDs

Archived data

Name	Archived Contents (millions)	Disk Space Occupied (TB)	Archive Format	TLD/Broad Crawls	Selective Crawls (Yes/No)	Comments
Australia's Web Archive^[1]	3100	104.5	ARC/WARC	.AU	Y	.AU crawls (2005-2009): 3 billion files (100 TB). Selective crawls (1996-today): 100 million files (4.5 TB). There are 3 copies of each content.
Our digital island, a Tasmanian Web Archive^[2]		0.336	HTTrack		Y	Preserves online contents related to Tasmania. ODI has operated since its inception under the assumption that web sites fall within the definition of ‘Book’ in the Tasmanian Library Act 1984.^[54] Thus, no permission to capture from publishers is required.
Web@rchive Austria^[4]	455	6.61	ARC	.AT	Y	A copy of the data will be stored in a high security data storage unit.
DILIMAG (Digital Literature Magazines)^[5]	0.03	0.996	ARC			Project from 2007-03-01 until 2010-12-23. The project DILIMAG for collecting, describing and archiving of digital German literary magazines.
Government of Canada Web Archive (GCWA)^[6]	170	7			Y	Selective crawls of the web domain of the Federal Government of Canada (.GC.CA)
Web Information Collection and Preservation - WICP (Chinese Web Archive)^[7]				.GOV.CN	Y	Harvest of the web pages about the events that have great influence on the society, economy and so on, and the sites in 'gov.cn' domain.
Croatian Web Archive (Hrvatski arhiv weba - HAW)^[8]	81	3.4			Y
WebArchiv (National Library of the Czech Republic)^[9]	526	24		.CZ	Y	Harvesting began in 2001.
Netarkivet.dk^[10]	6008	190	ARC/WARC	.DK	Y	It uses NetarchiveSuite.dk was developed by two Danish libraries and Heritrix.
Finnish Web Archive^[11]	494	23		.FI, .AX	Y	Also crawls contents hosted on machines physically located in Finland, independently from their domain.
BnF - BnF Web Legal Deposit^[12]	14000	200	ARC/WARC	.FR	Y
Ina (Institut National de l'Audiovisuel)^[13]	8400	56	DAFF	N	Y	Digital Archive file format handles file redundancies. The size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be 665 Tb
E-diaspora (Télécom ParisTech, FMSH)^[14]	237	2	DAFF	N	N	Digital Archive file format handles file redundancies.The size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be 10 Tb
Internet Memory Foundation (ATN service)^[15]		180	WARC	Can be done by partners	Y	Formerly European Archive.^[55] Provides the Archive The Net Service (ATN Service). Selective crawls (140 TB), Domain crawls (40 TB), expect to grow to 1PB in 2011. New datacenter and a new crawler in 2011.
Bibliotheksservice-Zentrum Baden-Württemberg^[16]		1	HTTrack		Y	Bibliotheksservice-Zentrum Baden-Württemberg -German is operating following Web-Archives: 1- Baden-Württembergisches Online-Archiv (BOA) 2- Saardok 3- Literatur im Netz des Deutschen Literaturarchivs Marbach.^[56]
Web archive of the German Bundestag^[17]					Y	German Federal Parliament. Selective. At regular intervals or at certain events are snapshots (snapshots) of www.bundestag.de and other web presences of the German Bundestag made. These are available in the web archive to date available.
Iceland^[18]
Japan Web Archiving Project^[19]	319.8	38.2	WARC	-	Y	15 TB of selective crawls based on permission (2002–2010). Started the web archiving of official institution sites based on the legislation from April 2010.
National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)^[21]		24			Y	Requires consent before archiving. Targets 56,401 Websites. Web archiving is managed under Digital resource management systems. In 2011 web arching system will be rebuild.
Koninklijke Bibliotheek^[22]		5	ARC		Y
New Zealand Web Archive^[24]	346	13		.NZ	Y	.NZ crawls: 105 million URLs (4.1 TB) in 2008, 170 million URLs (6.1 TB) in 2010. Selective crawls of 7 599 websites in the National Digital Heritage Archive (2.8 TB), 71 million contents estimated. Legal deposit covers born digital material (including websites).
The National Library of Norway^[25]
Portuguese Web Archive^[26]	889	25	ARC	.PT, .CV, .AO, .MZ	Y	TLD crawls and integration of external collections since 2007, selective crawls since 2010.
Web archive of Čačak^[27]	0.255	0.013	HTTrack		Y	Selective crawls of 130 sites related to the city of Čačak. Collaboration with the WebArchiv team from the National Library of the Czech Republic.
Web Archive Singapore^[28]				.SG	Y	Selective crawls of 1000 Singapore-related sites, with the written consent of the owners. Whole .SG domain archiving.
Slovenian Web Archive^[29]		1.5	WARC			Selective crawls
Digital Preservation of .ES domain^[30]	855	30	ARC	.ES		Collaboration with Internet Archive. Domain crawl of .ES, harvested quarterly. Not launched publicly yet.
Digital Heritage of Catalonia^[31]	200	7.7	ARC	.CAT	Y	In accordance with the general trend, the archive model is a hybrid system consisting: Mass compilation of open-access digital resources published on the Internet (.cat); Systematic archiving of the web site output of Catalan organizations; Fostering of lines of research through themed integration of the digital resources pertaining to specific events in Catalan public life (elections, museums, etc.)
Basque Digital Heritage Archive^[32]	21	0.8	ARC		Y
Sweden (Kulturarw3)^[33]	1710	71.3	Multipart MIME	.se, Swedish .nu and geolocation for other tld's	Y	Bulk crawls approximately twice a year. Selective crawls of about 140 newspapers every day.
Aleph Archives^[34]		23	WARC, WARC2, ARC and HTTrack to WARC migration tools		Y	Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their web contents regardless of their types (websites, wikis, social media, forums...).
Web Archive Switzerland^[35]		0.1	ARC		Y
NTU Web Archiving System, NTUWAS^[36]	200	14			Y
Web Archive Taiwan^[37]
The UK Web Archive^[38]		6.9	ARC		Y	Selective crawls with previous permission. Expect to run wholesale UK domain-scale crawls once Legal Deposit legislation is implemented in April 2011. The UKWA is a spin-off from the UK Web Archiving Consortium that ended in 2007.
Hanzo Archives^[39]		7	WARC		Y	Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive^[40]		32	ARC			The UKGWA is a spin-off from the UK Web Archiving Consortium that ended in 2007.
Internet Archive (provides Archive-it service)^[41]	150000	5500		World-wide	Y	Provides the Archive-it service and leads the Archive-access project (Internet Archive ARC access tools). Collection is mirrored at Bibliotheca of Alexandrina in Egypt.
Reed Technology Web Archiving Services^[42]
Columbia University Libraries Web Resources Collection Program^[43]	23.1	1.8	ARC/WARC		Y	Selective crawls with permission or notification; primarily thematic collections.
North Carolina State Government Web Site Archives^[44]	51.5	3.8	WARC		Y
Latin American Web Archiving Project^[45]					Y
Web Archiving Project for the Pacific Islands^[46]	5.5		ARC/WARC		Y	Includes sites of 18 countries.
Library of Congress Web Archives^[47]	5	230	ARC/WARC		Y	Formerly MINERVA. Selective crawls with notification and permission; primarily event and thematic collections.
Harvard University Library: the Web Archive Collection Service (WAX)^[48]	19	0.661	ARC		Y	Selective crawls with no previous authorization.
Web Archiving Service from California Digital Library (WAS service)^[49]	216	25.2	ARC/WARC	Can be done by partners	Y	Provides Web Archiving Service (WAS) to partners world-wide. Was developed at the California Digital Library.
University of Michigan Web Archives Project^[50]		0.65	ARC/WARC		Y	WAS service since 2010.
University of Texas at San Antonio Web Archives^[51]	26	1.135	ARC/WARC		Y	University administration, faculty and student sites; as well as selective captures on San Antonio and South Texas subject areas, including San Antonio organizations; San Antonio Online Journals and Blogs; Tejano and Conjunto music; Gay, Lesbian, Bisexual, Transgender and Queer Related Web sites in Texas, San Antonio and the Rio Grande Valley; Immigration/Borderlands; Mexican Cooking Blogs; San Antonio Restaurants; Renewable Energy in Texas; Rio Grande Valley Organizations; and Rio Grande Watershed and Texas Water Issues .

Access methods

Name	URL history (Yes/No)	Meta-data (catalog/advanced) search (Yes/No)	Full-text search (Yes/No)	Comments
Australia's Web Archive^[1]	N	Y	Y	Selected sites are publicly available through a directory structure. Domain harvests are not. The PANDORA Archive is indexed and searchable through the NLA's single search service Trove.^[57] The Australian Domain Harvests are full-text indexed but are not currently publicly available.
Our digital island, a Tasmanian Web Archive^[2]	Y	Y	N	Presents thumbnails generated through Html To Image supplemented in HTTrack. Information is organized in directory: A-Z Subject listing, A-Z Title listing.
Web@rchive Austria^[4]	Y	N	N	Only accessible on special terminals at the Austrian National Library. Presents thumbnail previews of archived pages and supports keyword search within URL.
DILIMAG (Digital Literature Magazines)^[5]	Y	Y	N	Metadata are publicly available, for the archived versions provides free or restricted access depending on the right holders agreement. Full-text search was not implemented due to lack of resources.
Government of Canada Web Archive (GCWA)^[6]	Y	Y	Y	Technical details available.^[58]
Web Information Collection and Preservation - WICP (Chinese Web Archive)^[7]		Y		Archive content is only available in intranet in National Library of China. Some collections are publicly available, with meta-data search and browsable by collection.
Croatian Web Archive (Hrvatski arhiv weba - HAW)^[8]	Y	Y	Y
WebArchiv (National Library of the Czech Republic)^[9]	Y		Y	Due to copyright restrictions, only a limited number of archived websites for which agreements were signed with the publishers is available online. For other resources you can find out whether a given website was archived and the number of harvested versions. Unlimited access to all resources in WebArchiv is available from public terminals in the National Library.
Netarkivet.dk^[10]	Y	N	N	Online access granted only to researchers using a proxy solution that accesses an archive index. Soon it will set up user access through the Wayback Machine. It has established a framework for running batch jobs with the possibility of data mining.
Finnish Web Archive^[11]	Y	N	30% of material.	URL search but onsite access to contents. Full-text search is available to 30% of material.
BnF - BnF Web Legal Deposit^[12]	Y	N	15% of the collection	Accessible to authorized users of the BnF, through the reading rooms of the Research Library located in Paris and Avignon. Wayback Machine interface was translated to French. Full Text search only for a relatively small portion of the collection (15% of 200 TB) indexed by Internet Archive. No current full text search implemented in workflow. Builds special collection galleries based on a selection from the archive on a given topic.
Ina (Institut National de l'Audiovisuel)^[13]	Y	Y	Y	Full text indexing is based on Lucene. To accommodate results from frequent crawls (up to every 2 hours for home pages) clustering is operated to handle similar versions of pages
E-diaspora (Télécom ParisTech, FMSH)^[14]	Y	N	N	1381 sites are currently crawled to build an archive on migrants usage of the web, social studies researchers have launched a long run project based on this archive (http://ediasporas.ticmigrations.fr/) Ina is hanling crawls and storage
Internet Memory Foundation (ATN service)^[15]	Y	Y	Y	Provides access and search services according to partners policy.
Bibliotheksservice-Zentrum Baden-Württemberg^[16]	Y	Y	Y	Search available (on development).^[59]
Web archive of the German Bundestag^[17]	Y	N	N	Web archive itself are snapshots of www.bundestag.de and other websites. Navigation is possible by clicking on the years.^[60]
Iceland^[18]
Japan Web Archiving Project^[19]	Y	Y	Y	Public access to sites after permission of the site owners. Open access to important publications such as white papers.
National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)^[21]	Y	Y	Y	100% of the archive is indexed. Enables search by topic classification (e.g. Religion, Science, Arts). Search available.^[61]
Koninklijke Bibliotheek^[22]				The web archive will become available online during the first half of the year 2010.
New Zealand Web Archive^[24]	Y	Y	N	Domain harvests are available to selected staff only using Wayback and limited to URL searchers. Selected harvestings, each website is described in the catalogue (providing subject, author, title and URL searches) and can be viewed by the public via the Internet by clicking on the link to the archived copy. The websites themselves however are not indexed.
The National Library of Norway^[25]	N	Y		Sites are integrated in the Catalog. Left bar enables facet navigation with drill-down.^[62]
Portuguese Web Archive^[26]	Y	Y	Y	20% of the archive is indexed and na experimental full-text service is available. Archived data can be mined through an Hadoop platform.
Web archive of Čačak^[27]	N	N	N	Plans to develop a search engine in the future. One bad characteristic of HTTrack is that it renames files during the archiving, so the original structure of the website is lost, as well file names.
Web Archive Singapore^[28]
Slovenian Web Archive^[29]	Y	N	N	The archive is not public yet. Plans to implement full-text search.
Digital Preservation of .ES domain^[30]	Y (Future)		Y (Future)	Plan to grant access through computers available at a given hall.
Digital Heritage of Catalonia^[31]	Y	Y	Y	Full open access.
Basque Digital Heritage Archive^[32]	Y	Y	Y
Sweden (Kulturarw3)^[33]	Y	N	N	Public access through dedicated machines in the library building.
Aleph Archives^[34]	Y	Y	Y	The full text search engine support automatic metadata extraction, and native results deduplication. Also included: antivirus checker (~250mil. pages/day), archives statistics , text summarizer, archives exports (PDF, PNG, TIFF), etc.
Web Archive Switzerland^[35]		Y (in 2011)	Y (in 2011)	The archived versions of the sites are not yet accessible. Web Archive Switzerland will be open to the public by spring 2011 - only access within the National Library and the partner libraries will be possible. The sites are being catalogued and the records are integrated in our library catalog Helveticat.^[63]
NTU Web Archiving System, NTUWAS^[36]	Y	Y	Y	Presents page thumbnails, archived pages mapped to geographical locations.
Web Archive Taiwan^[37]	Y	Y	Y
PageFreezer ^[3]	Y	Y	Y	Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws. Used by government agencies and public listed corporations in Pharmaceutical, Food, Finance, Healthcare and Retail industry.
The UK Web Archive^[38]	Y	Y	N
Hanzo Archives^[39]	Y	Y	Y	Commercial web archiving services and appliances. Access includes full-text search, annotations, redaction, URL/History, archive policy and temporal browsing, and configurable metadata schema for advanced e-discovery applications. Used in government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive^[40]	Y	Y	Y	Full text search is operational on the UK Government Web Archive.^[64] Users can browse the collection using a full A-Z list of all sites^[65] and a set of categories.^[66]
Internet Archive (provides Archive-it service)^[41]	Y	Y	Y	URL history is available for all archived data. Meta-data and full-text search only for selected crawls. Until 2002 had a mining platform for research composed by Alexa Shell Perl Tools av_tools and p2 platform for parallel processing.^[67] It was replaced by a simpler access and direct method that enables automatic access to files but no platform for processing.^[68]
Reed Technology Web Archiving Services^[42]
Columbia University Libraries Web Resources Collection Program^[43]	Y	Y	Y	Accessible through Archive-it service.^[69]
North Carolina State Government Web Site Archives^[44]	Y	Y	Y	Accessible through Archive-it service.^[69]
Latin American Web Archiving Project^[45]	Y	Y	Y	Content can be accessed via full-text search, or by browsing by country or by specialized sample collection.
Web Archiving Project for the Pacific Islands^[46]	Y	Y	Y	Supported by Archive-it service.
Library of Congress Web Archives^[47]	Y	Y	N	Access provided via http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html. Records in MODS (Metadata Object Descriptive Schema) format.
Harvard University Library: the Web Archive Collection Service (WAX)^[48]	Y	Y	Y
Web Archiving Service from California Digital Library (WAS service)^[49]	Y	Y	Y	Access for private study, scholarship and research. Most archives built with WAS have not yet been published because it is up to the partners to decide if they want to provide access. There are 16 partners using the service and they have created over 80 web archives, only 30 are publicly accessible. NutchWAX performance did not permit full archive search. Upcoming transition to SOLR will permit both full archive and collection-specific full text search.
University of Michigan Web Archives Project^[50]	Y	Y	Y	Powered by the WAS from the California Digital Library.^[70] Access is public but usage is restricted for private study, scholarship and research.
University of Texas at San Antonio Web Archives^[51]	Y	Y	Y	Accessible through Archive-it service^[71] and the Texas Archival Repositories Online database^[72]

List of Web archiving initiatives

Contents

Web archiving initiatives

Archived data

Access methods

References